Skip to main content
Version: V11

Understanding Retrieval-Augmented Generation (RAG) in VIDIZMO

Retrieval-Augmented Generation (RAG) in VIDIZMO enables AI-powered chat bots and agents to answer user questions using your organization’s actual portal content videos, documents, transcripts, captions, and extracted text rather than relying solely on pre-trained AI knowledge. This makes responses more accurate, relevant, and verifiable.

With RAG, the AI does not “guess.” Instead, it retrieves context from your VIDIZMO content library, processes it through workflows, and generates responses that can include direct citations, timestamps, and source links. This transforms the chatbot from a generic assistant into a domain-aware knowledge system built on your content.

What is RAG in VIDIZMO

Retrieval-Augmented Generation (RAG) is the core AI pattern that powers the VIDIZMO chatbot, where responses are grounded in real, retrievable content from your portal rather than relying solely on a model’s pre-trained knowledge.

In VIDIZMO:

RAG = AI responses backed by your portal content.

The chatbot first retrieves relevant content from VIDIZMO including videos, documents, transcripts, captions, OCR-extracted text, and video descriptions.

The retrieved content is then used as context for the AI to generate accurate, relevant, and source-backed responses.

In practical terms, your VIDIZMO portal acts as the chatbot’s knowledge base, ensuring answers are contextual, verifiable, and aligned with your organization’s information.

How RAG Works in VIDIZMO

In VIDIZMO, Retrieval-Augmented Generation (RAG) enables the AI chatbot to deliver answers grounded in your actual portal content. When a user asks a question, the system doesn't rely solely on the AI model's pre-trained knowledge. Instead, the workflow embeds the user's query into a vector and searches your portal for semantically relevant content, such as transcripts, documents, video descriptions, or OCR-extracted text.

Once relevant content is identified, the AI uses it as context to generate a response. This approach ensures answers are not only accurate but also verifiable, often including citations or links to the source material.

All of this happens inside workflows, which serve as the engine behind RAG. Workflows define how queries are processed, how content is retrieved, and how responses are generated. They are highly flexible: while one workflow may implement RAG for answering questions, others can perform web searches, combine multiple content sources, or run mashup operations. In essence, RAG is a pattern applied within workflows, allowing VIDIZMO to deliver context-aware, source-backed AI responses while workflows remain adaptable to a variety of tasks.

The Three Layers of RAG in VIDIZMO

RAG in VIDIZMO is built around three interconnected layers that together enable context-aware AI responses:

  • Chatbot (User Interface): The point of interaction where users ask questions and receive answers.
  • Agent (Behavior & Configuration Layer): Connects to a workflow, manages system prompts, defines knowledge scope, and controls response formatting and citations.
  • Workflow (Processing Engine): The engine that executes RAG: embedding queries, searching content, processing retrieved data, and generating structured responses. Workflows are designed visually in the Workflow Designer using modular nodes.

Workflows: The Engine Behind RAG

RAG workflows are modular and configurable, typically including nodes such as:

  • User Prompt Node: Captures the user's question.
  • Embedding Node: Converts the user's question into a vector for semantic search.
  • Search Mashup Node: Performs keyword and vector search across VIDIZMO content.
  • LLM Node: Generates answers using the retrieved content as context.
  • Chat Output Node: Streams the response to the user, including source citations.

Workflow design is flexible and visual: drag-and-drop nodes, connect logic paths, configure parameters, then save and attach the workflow to an agent.

One Agent, Many Possibilities

While each agent connects to a single workflow, workflows can be tailored by:

  • Department: HR, IT, Sales workflows search content relevant to each function.
  • Task Type: Q&A, search, or analysis workflows handle different content processing needs.
  • AI Model: Smaller models for fast responses, larger models for high accuracy and deeper analysis.

This structure allows VIDIZMO to support diverse use cases with a single chatbot platform.

User Interaction: From Question to Answer

When a user asks a question, for example, “What is our vacation policy?”:

  • The chatbot captures the query.
  • The agent applies system settings and routes the request to its workflow.
  • The workflow executes: query embedding → content search → retrieval of relevant videos and documents → LLM generates response with context → response delivered to the chatbot.
  • The user receives an answer with verifiable sources, such as:
    • PTO Policy.pdf (Page 2)
    • HR Training Video (3:24)

This layered approach ensures that responses are accurate, contextually grounded, and traceable to original content, all while maintaining flexible workflow design for multiple purposes.

RAG in the Chatbot vs. Semantic Search in the Portal

Both RAG and semantic search rely on embeddings, but they serve different purposes and operate in different parts of VIDIZMO.

At a high level, RAG happens at query time inside workflows, while semantic search in the portal depends on embeddings generated at content upload. Understanding this distinction helps clarify why one requires the Embedding App and the other does not.

Feature Where Configured Embedding App Required?
RAG in Chatbot Workflow → Agent No
Semantic Search in Portal Portal Settings Yes

How they differ in practice

  • RAG in the chatbot is driven entirely by workflows. When a user asks a question, the workflow dynamically embeds the query, searches for relevant content, and passes that context to the LLM to generate a response. Because this embedding happens inside the workflow at runtime, it does not depend on the Embedding App being enabled.
  • Semantic search in the portal, by contrast, relies on pre-generated embeddings. When content is uploaded, the Embedding App converts transcripts, descriptions, and extracted text into vectors and stores them in the vector database. These stored embeddings make it possible for the portal’s search bar to understand meaning rather than just matching keywords.

In other words:

  • RAG = live, workflow-driven embedding of user questions.
  • Portal semantic search = pre-computed embeddings of your content.

This separation gives VIDIZMO flexibility: you can use RAG-powered AI chat without enabling portal wide semantic search, or enable both for a richer discovery experience across the platform.

While RAG workflows embed user queries at runtime, vector-based semantic search requires content to have embeddings already stored in the database. The Search Mashup node supports two search modes:

  • Keyword search: Works without content embeddings. Matches exact terms in titles, descriptions, and metadata.
  • Vector search: Requires content embeddings to exist. Finds semantically similar content even without exact keyword matches.

If your workflow uses vector search and content lacks embeddings, the search returns no vector-based results. To ensure full semantic search capability:

  • With Embedding App enabled: Content embeddings generate automatically during upload.
  • Without Embedding App: Only keyword search is available unless embeddings are added through other means.

In summary, the workflow embeds the user's query at runtime. The content embeddings must already exist for vector similarity matching to work.

Summary

RAG in VIDIZMO operates through three distinct layers. At the top is the chatbot, which serves as the user interface where questions are asked and answers are displayed. Behind the scenes, the agent manages behavior and connects the chatbot to a specific workflow, determining how queries are processed.

The workflow contains the RAG processing logic: embedding the user's query, performing keyword and vector search, retrieving relevant content, and generating a response. For vector search to return results, content must have embeddings stored in the database (typically via the Embedding App). Users interact only with the chatbot, while the agent and workflow handle all retrieval and response generation seamlessly.